A Bayesian approach to identifying the role of hospital structure and staff interactions in nosocomial transmission of SARS-CoV-2

Nosocomial infections threaten patient safety, and were widely reported during the COVID-19 pandemic. Effective hospital infection control requires a detailed understanding of the role of different transmission pathways, yet these are poorly quantified. Using patient and staff data from a large UK hospital, we demonstrate a method to infer unobserved epidemiological event times efficiently and disentangle the infectious pressure dynamics by ward. A stochastic individual-level, continuous-time state-transition model was constructed to model transmission of SARS-CoV-2, incorporating a dynamic staff–patient contact network as time-varying parameters. A Metropolis–Hastings Markov chain Monte Carlo (MCMC) algorithm was used to estimate transmission rate parameters associated with each possible source of infection, and the unobserved infection and recovery times. We found that the total infectious pressure exerted on an individual in a ward varied over time, as did the primary source of transmission. There was marked heterogeneity between wards; each ward experienced unique infectious pressure over time. Hospital infection control should consider the role of between-ward movement of staff as a key infectious source of nosocomial infection for SARS-CoV-2. With further development, this method could be implemented routinely for real-time monitoring of nosocomial transmission and to evaluate interventions.


Introduction
Healthcare-associated infections are a significant burden to health systems worldwide, and are associated with increased morbidity and mortality [1].Transmission of SARS-CoV-2 in healthcare settings was widely reported during the first wave of the COVID-19 pandemic [2][3][4][5].Hospital patients are considered to be especially vulnerable to severe COVID-19 and healthcare workers have been shown to be at an increased risk of infection [6][7][8].Identifying nosocomial transmission routes of SARS-CoV-2 is therefore critical to patient and staff safety.Universal point-of-care testing of patients and staff was not available during the initial stages of the pandemic in the UK to rapidly identify and isolate individuals.A triage system was recommended in hospitals to isolate and test patients with suspected COVID-19 [9].Although hospitals employed a range of infection prevention control (IPC) protocols to minimize transmission, high levels of hospital infections were reported [2,10].Healthcare-associated infections, or nosocomial infections, are defined as infections that occur either as a direct result of healthcare interventions or from being in contact with a healthcare setting [11].
To determine nosocomial transmission routes of a respiratory pathogen, the network of transmission opportunities within the hospital, hereafter called the contact network, must be considered.The routine running of a hospital in the UK, comprising wards, bays and side rooms, can give rise to additional structural and network transmission opportunities.Partitioning of patients into wards may be by diagnosis, severity of illness, or by infection status during an outbreak.Patients may have direct contact with staff, visitors and patients in the same ward, but also indirect contact with other patients and staff through shared equipment and objects, relevant to fomite transmission, airflow, relevant to airborne transmission, and staff acting as vectors for infectious diseases.Modelling studies have been conducted to investigate nosocomial transmission routes of numerous pathogens, most commonly Methicillin-resistant Staphylococcus aureus (MRSA) and Vancomycin-resistant Enterococcus (VRE) [12].While transmission from patients to patients and healthcare workers to patients has been identified, there remains a lack of consensus as to the primary routes of nosocomial infection [7,13,14].
One of the challlenges of identifying transmission routes in a hospital is distinguishing between hospital-acquired and community-acquired infections.Knight et al. report large uncertainty in the classification of nosocomial and community-acquired infections of SARS-CoV-2 during the first part of 2020 [15].Routinely collected hospital data tend to record a timestamp of when a patient tested positive for an infection.For COVID-19, this provides a marker as to when a patient was infectious.However, the time that a patient became infected is unobserved, and the time a patient recovered from infection and ceased to be infectious is also often unobserved.This makes the inference for nosocomial models much more difficult and computationally intensive, since we need to explore all possible configurations of event times consistent with the epidemiological landscape (such as contact networks and physical proximity within the hospital).
Generally, surveillance data only capture a partially observed epidemic process, and therefore requires a form of data augmentation to estimate unobserved epidemiological event times.O'Neill & Roberts initially introduced a Bayesian data augmentation approach to inference of general stochastic epidemic models, where unobserved event times are treated as parameters to be estimated [16].This has proved to be a popular method for conducting inference on partially observed epidemic models [17,18].A drawback of this method is that repeatedly calculating the likelihood can become extremely costly computationally and its use is therefore limited by population size.As an alternative method, McKinley et al. demonstrate using approximate likelihood ratios to conduct inference on partially observed epidemics, though this relies on repeatedly simulating the epidemic [19].Previous studies which model nosocomial transmission have used data augmentation to handle unobserved infection times [20,21].However, these studies do not include a time-varying contact network parameter to model interactions between staff and patients.
Here, we use routinely collected data from a large acute-care hospital in the UK to quantify the temporal and network dynamics of nosocomial transmission of SARS-CoV-2.We demonstrate a Bayesian approach to conducting inference with a time-varying covariate, a fine-scale patient-staff contact network, to estimate unobserved epidemiological event times and provide insight into within-hospital transmission dynamics.

Covariate data
Patient testing data from a large acute-care hospital in the UK were used in conjunction with staff rota data (staff shift times and ward assignments) to identify routes of nosocomial transmission of SARS-CoV-2 during the first wave of the COVID-19 pandemic.A patient pathway was developed with a colour-coded ward system to separate patients based on their SARS-CoV-2 infection status; figure 1. Universal testing for SARS-CoV-2 was not available for patient admissions or staff during our study period.Patient diagnosis was therefore based on clinical suspicion of COVID-19 and a confirmatory test.Some wards (e.g. the isolation ward) could be assigned different colour areas within a single ward based on the availability of side rooms (enclosed patient rooms), which might be treated differently to communal areas.Ward

Hospital network
Patient movements (ward transfers, admission and discharges) and staff shift times were recorded continuously in time.However, the recorded times of these movements were not likely to be exact in practice, and the volume of changes to the patient and staff structure leads to prohibitively large data structures in RAM.Hence, patient and staff movements were aggregated to 1 h time intervals to form the basis of our discrete-time contact network.
The hospital contact network is represented in three distinct ways.Firstly, a weighted ward connectivity tensor, C, of shape [p × p × T], where element c qrt is the connectivity between ward q and r at time t, in units of number of staff working across multiple wards.This primarily consists of doctors who are assigned to a group of wards per shift.Similarly, we define a spatial adjacency tensor of shape [ p × p × T ] denoted W, where w qrt is the number of staff using the kitchen if the wards share a kitchen, and 0 otherwise.For example, in figure 2, wards A and B share kitchen A, so w abt equals the total number of staff allocated to side A (wards A-F) on that floor of the hospital.Lastly, a membership tensor of shape [N × p × T ] denoted M, where m iqt = 1 if individual i is a member of ward q at time t and zero otherwise.Visualizations of the contact network can be found in the electronic supplementary material.
We introduce the dot subscript notation to indicate a slice of a tensor.For example, C q•t refers to a slice of tensor C along the second axis, which can be thought of as a vector representing the connectivity between ward q and all hospital wards (8r) at time t.

Modelling
In this section, we first describe the model structure in terms of the data generating process, before outlining our approach to Bayesian inference with the appropriate likelihood function, posterior distribution and Markov chain Monte Carlo (MCMC) algorithm.

Model structure
Transmission of SARS-CoV-2 is represented here by a stochastic individual-level, continuous-time state-transition model.At any point in time, patients are considered to belong to one of four mutually exclusive epidemiological states: susceptible to infection (S), infected but not infectious (E), infected and infectious (I) or recovered/removed from the population (R).We denote the sets of individuals in each state at time t as S(t), E(t), I(t), R(t), respectively, and for convenience write X(t) = {S(t), E(t), I(t), R(t)}.Any particular patient, i is assumed to progress between the states according to the transitions [SE], [EI] and [IR] (where [PQ] denotes a transition from state P to state Q) with transition rates l SE i ðtÞ, l EI i ðtÞ and l IR i ðtÞ, respectively, as defined below.royalsocietypublishing.org/journal/rsif J. R. Soc.Interface 21: 20230525 At time t, the rate at which patient i = 1, …, N becomes infected, i.e. l SE i ðtÞ, is assumed to be a function of the timeevolving infectious landscape.Within the hospital, we assume that i experiences infection pressure from four separate sources: other patients in the same ward; patients in other wards connected by staff being assigned to more than one ward; the spatial structure of wards connected by adjoining kitchens; constant 'background' infection, representing sources of infection not explicitly modelled by the hospital structure.Within-hospital routes of infection are represented by our patient-staff contact networks (C, W, M).Within the continuous-time model, changes to the patient-ward and staff-ward structures (either via patient movements or staff-ward allocation) are assumed to occur at discrete intervals of 1 hour, aggregating the precisely time-stamped patient movement and staff shift data to the hour.Continuous time was used to model the epidemic process to account for the fluidity of events in the hospital through time, thus avoiding the necessary act of choosing a time step.In addition, this leads to more efficient sampling from the posterior distribution of censored event times, avoiding large swings in posterior density that otherwise occur in discrete-time systems.
The rate at which an individual transitions from state S to E can be described as a time-dependent infectious pressure.We assume that infections occur at points of a right-continuous-time inhomogeneous Poisson process at a rate equal to the sum of infectious pressure on susceptibles immediately before that time point.The infectious pressure is density dependent and is defined at the individual level.We account for the number of infected on each ward at time t, the patient-staff contact network at time t and a background infection rate.An individual can either be present in the hospital H (i ∈ H) and admitted to ward q (i ∈ q), or in the community (i Ó H).The infectious pressure on a susceptible individual i (i ∈ S(t)) at time t can be defined as where β 1 denotes the transmission rate for within-ward mixing and I q (t) represents the number of infected individuals on ward q at time t.β 2 and β 3 are transmission rates for between-ward mixing, with C q•t denoting the connectivity between ward q and all other wards at time t, and W qÁt describing the connectivity between ward q and all other wards by ward proximity at time t.I(t) denotes a vector whose q th entry is equal to I q (t) and thus represents the number of infected individuals on each ward at time t.β 4 is a background hospital transmission rate.To allow for individuals to be infected before and between hospital admissions, we define a constant infectious pressure β 5 that is exerted on to susceptibles in the community.Similarly, we can define an individual's [EI] transition rate as follows: where i ∈ I(t) represents an individual i in the I state at time t.
Thus the [EI] and [IR] transition rates are assumed to be constant across individuals and time, and for identifiability reasons, we fix a ¼ 1=4 day À1 and g ¼ 1=5 day À1 , respectively [22].
The epidemic process is assumed to be Markovian.The data generating process is outlined in algorithm 1.
The epidemic process is assumed to be Markovian.The data generating process is outlined in Algorithm 1.

Bayesian inference
A Metropolis-Hastings MCMC algorithm is used to estimate the transmission rate parameters θ = {β 1 , β 2 , β 3 , β 4 , β 5 } and the unobserved [SE] and [IR] state transition times for each infection event according to the method of Jewell et al. [18].Each infection royalsocietypublishing.org/journal/rsif J. R. Soc.Interface 21: transition event is associated with an observed [EI] transition time, where an individual's [EI] transition time is assumed to be 2 days before the collection of their first positive swab.A sensitivity analysis was conducted to assess the robustness of our findings given different delays between a patients first positive test and their [EI] transition time.The inclusion of a time-varying covariate, the contact networks, in the continuous-time model adds to the complexity of computing the likelihood.We define the likelihood of observing (t, k) transitions, in terms of an ordered event list, where an event is considered to be a transition event (transitioning between states: S → E, E → I or I → R) or a hospital network update, as follows: where t and k are the (time-ordered) lists of event times and indices as in algorithm 1, X(0) denotes the initial conditions, θ the parameters as above, and covariate data C, W and H. h(t k ) is a vector of length 3N + 1 of hazard rates for the three transitions, with N = 2981 individuals and a covariate marker to indicate a hospital network update.That is, hðtÞ ¼ ðl SE 1 ðtÞ, . . ., l SE N ðtÞ, l EI 1 ðtÞ, . . ., l EI N ðtÞ, l IR 1 ðtÞ, . . ., l IR N ðtÞ, 1Þ ⊺ similar to algorithm 1.Thus h k l ðt l Þ denotes the hazard rate for the k l th transition at time t l .Note that the trailing 1 corresponds to a contact matrix change (a hospital network update), such that for a timepoint at which the contact matrices change the likelihood term represents the probability of surviving an epidemiological state transition in the preceding time interval.
The following prior distributions were chosen for all transmission parameters (β 1 , β 2 , β 3 , β 4 , β 5 ) denoted by f (θ).The prior distributions were chosen to favour small beta values which were still bound by zero.b i iid Gammað1:1, 1000Þ: ð3:5Þ The joint conditional posterior can therefore be defined as where z = ([SE], [IR]) and t z and t −z denote partitioning of elements of the lists t into unobserved and observed events, respectively (and similarly for k).As the likelihood is intractable to integration, a Metropolis-within-Gibbs MCMC algorithm is used to sample from the joint posterior.Full details of the MCMC algorithm used can be found in the electronic supplementary material.To evaluate chain convergence, the Gelman and Rubin potential scale reduction statistic is calculated for three chains with differing starting values drawn from the prior distribution [23].The posterior predictive distribution is analysed visually to assess how well our modelled estimates fit the observed data.

Infection hazard attributable fraction
Having computed the joint posterior distribution, we are able to investigate how within-ward and between-ward dynamics contribute to nosocomial transmission.We calculate the attributable fraction, denoted AF, per infection for each transmission type: within-ward transmission β 1 ; between-ward transmission rates β 2 and β 3 ; background hospital transmission rate β 4 ; community transmission rate β 5 .For example, the attributable fraction for individual i infected from between-ward transmission on ward q can be defined as follows: i ð e tÞ , e t ¼ t idxðk¼iÞ , ð3:7Þ where idx(k = i) denotes the index of the element of k which is equal to i (i.e.corresponding to the [SE] event for individual i).
The attributable fractions are aggregated for each set of sampled parameter estimates to compute the mean attributable fraction and 95% credible intervals (95%CrI) for each transmission type per infection.Similarly, using the sampled parameter estimates, we can identify which wards and associated ward colours nosocomial infections take place in.

Results
The results presented are of SARS-CoV-2 infections recorded in a UK hospital one month into the first wave of the pandemic.In a population size of 2981 patients from 12 April to 10 May 2020, we have identified 131 infection events, [EI] transitions, which are a combination of community-acquired and nosocomially acquired infections.Historic testing data, 3 days prior to our study start date, were used to identify the 58 patients which are considered to be initially infectious, residing in the I state, in our model.

Parameter estimation
To estimate our parameters of interest (β 1 , β 2 , β 3 , β 4 , β 5 , t z ), we ran a Metropolis-Hastings MCMC algorithm for 11 000 iterations removing the first 1000 samples as burn-in.Convergence across all parameters is seen which is confirmed by the potential scale reduction statistic computed for three independent chains; see electronic supplementary material, table S1.The effective sample size was also computed for each transmission rate parameter for three independent chains; see electronic supplementary material.In figure 3, we present the kernel density estimates for each of the transmission rate parameters and for a random sample of transition event times.A constraint when drawing event times, is that an individual must have been admitted to hospital before their [IR] transition time.As the observed [EI] transition times are set to 2 days prior to a patient's first positive hospital test, there may be a period of time pre-admission which is unexplored by the MCMC algorithm, as seen with sample 1 and 6 of figure 3b.
The posterior predictive distribution is used to evaluate how well our model fits the observed data.An epidemic is simulated forward using the data generating model (algorithm 1) for each of the 10 000 sets of parameter estimates sampled by the MCMC algorithm.We compare the number of [EI] transitions occurring each day from the simulations with the observed data; figure 4. With the exception of one peak of [EI] transitions on day 12, the observed data sit comfortably within the 95%CrI of the aggregated simulation data.Figure 4 shows that in a given simulation, the peaks and troughs of the number of [EI] transitions per day may similarly sit outside of the credible interval.

Nosocomial transmission routes
One of the parameters of interest is the associated [SE] transition time for each infection.Estimating this enables us to identify which infections were most likely to be contracted nosocomially rather than in the community.Moreover, for each infection, we analyse how the different types of transmission contribute to the infectious pressure exerted on an individual directly before their infection event.Hospital-acquired infections are likely for 15.3% (20/131) of detected infections.These infections have a mean community attributable fraction of less than 0.5, with the majority (15/20) having a mean community attributable fraction of less than 0.1; figure 5. Between-ward transmission is the highest mean attributable fraction for 13 of the 20 infections identified as nosocomial.Infection events of patient 4 and patient 15 have the highest mean attributable fractions for between-ward transmission of 0.72 (95%CrI 0.12-0.99)and 0.83 (95%CrI 0.29-1.00);figure 5. Four nosocomial infections have a mean within-ward attributable fraction of zero, indicating that these infections occurred when there were no infectious individuals admitted to the individual's ward.
To assess the plausibility of our model output, we further explored the hospital data associated with the 15 individuals with the lowest community attributable fraction.One individual appeared to have been tested on the day they were admitted to hospital; on further inspection, this individual had been discharged from a 15-day hospital spell 2 days prior to re-admission which was captured within our study period.The other 14 individuals identified had been admitted to hospital for an average of 13.3 days (95%CrI 6.62-22.76)before their positive test sample was collected.
Using our modelling framework, we are able to identify the wards in which nosocomial infections occurred.Nosocomial infections are identified based on whether an individual's [SE] transition time is during a hospital admission spell.For each set of sampled parameters, the percentage distribution of nosocomial infections by ward is calculated and aggregated for the 10 000 samples.We  ) and wards which are assigned three colours of green, red and yellow (13.6%, 95%CrI 6.67-25.00).
The modelling suggests that the infectious pressure exerted on an individual in a ward changes considerably over time, this may be explained by the dynamic contact network.The within-ward infectious pressure will increase with the number of infectious individuals on the ward.Similarly, if there is an increase of infectious individuals on wards considered to be connected, the between-ward infectious pressure will increase.Interestingly, the source of infectious pressure exerted on individuals in the four wards that recorded the highest number of nosocomial infections was also dynamic; figure 6.For the first 23 days of our study period, each of these four wards were classified as green wards with the exception of ward 23 which was a mixed ward, either classified as green-red-yellow or green-red ward.We find infectious pressure dynamics varied by ward and ward colour; figure 6.An individual on ward 33 would have experienced an infectious pressure driven by between-ward dynamics whereas individuals on wards 13 and 22 would have experienced clear peaks in within-ward infectious pressure.
We conducted a sensitivity analysis of the delay between an individual's [EI] transition time and an individual's first positive test in hospital to assess the robustness of our results; electronic supplementary material, figures S3-S5.Delays of 1 day and 3 days were considered in addition to the 2 days used for our primary analysis.A by-product of altering the [EI] transition times is that the initial conditions of our model change, with a different number of individuals initially residing in the I state; electronic supplementary material, table S3.The number of nosocomial infections identified therefore varied.However, between-ward dynamics were found to be the key driver of infectious pressure in each analysis, and the same four wards were identified as housing the most nosocomial infections.

Discussion
This study provides a statistical framework for conducting inference on hospital outbreak dynamics to quantify the relative contribution of transmission routes for nosocomial infections.We have demonstrated that transmission parameters and unobserved event times can be inferred by incorporating a discrete time-varying patient-staff contact network and testing data into a continuous-time stochastic epidemic model.By estimating these parameters, we are able to disentangle the different components of infectious pressure and identify routes of nosocomial transmission of SARS-CoV-2 during the first wave of the pandemic.While computing the likelihood for these models can be computationally intensive, we found that with GPU acceleration (1 × NVIDIA V100 32GB), 11 000 iterations of our MCMC took approximately 61 min.
We estimate that 15.3% of identified SARS-CoV-2 infections in patients identified in hospital were nosocomially acquired, other studies which examined hospital infections during the first wave of the pandemic have reported similar findings [2,6,13,26].Nonetheless, we expect that this is an under-representation of the full extent of within-hospital transmission.Asymptomatic infections are not captured in this study as universal testing of admissions had not yet been introduced.Similarly, if a patient had been discharged while unknowingly infected, their infection would be unrecorded.A simple method which is used to identify nosocomial infections is to consider the time interval between admission and symptom onset.However, this can lead to an infected patient with multiple hospital spells in quick succession being misclassified as a community-acquired infection [15].The modelling

Figure 2 .
Figure 2. A schematic illustration of the hospital layout.Each floor of the hospital has two kitchens, one on side A and one on side B. The spatial adjacency tensor W, as defined previously, would consider all wards on the same floor and side of the hospital as connected.The number and size of wards on each floor of the hospital vary.

Figure 3 .Figure 4 .
Figure 3. (a) Kernel density estimates for each transmission rate parameter.Dashed lines represent the associated prior distribution.(b) Density estimates of [SE] and [IR] transition times for a random sample of eight observed [EI] events.Distributions for the associated [SE] and [IR] transition times are shown in grey and blue, respectively.The observed [EI] transition time for each randomly selected individual is represented as a green circle positioned between the [SE] and [IR] distributions.

Figure 5 .
Figure 5. Mean attributable fraction for each transmission type per infection event for 10 000 posterior samples.Infection events are displayed if they have a mean community transmission attributable fraction less than 0.5.

Figure 6 .
Figure 6.Mean infectious pressure for an individual on the stated ward at each [SE] transition, calculated for 10 000 posterior samples.The wards displayed are with the highest mean number of nosocomial infections.
colours could also change over time in view of capacity demands.Personal protective equipment (PPE) guidance for staff differed by ward colour.Our study period covered a four-week time span from 12 April to 10 May 2020.During this time, 3816 staff worked at least one shift at the hospital, consisting of 2948 healthcare

Table 1 .
Percentage of total nosocomial infections by designated ward colour.